load("~/Documents/Nonparametric Statisics/Project/clean data/functional/df_hour.RData")
first we start with outlier detection in the functional case
fd_day <- fData(1:24,as_tibble(df_hour[,2:25]))
plot(fd_day,lwd = 3,xlab = "day",ylab = "dayly number of crashes",main = "hourly crashes in each week")
functional bagplot:
hour_fbagplot <- fbplot(fd_day, main="Magnitude outliers horly data")
the default F is:
hour_fbagplot$Fvalue
[1] 1.5
no outliers found
df_hour[hour_fbagplot$ID_outliers,1:2]
observations with 0 crashes create a numerical problem, we add 1 to everithing
df_hour2 <- df_hour + 1
fd_day2 <- fData(1:24,as_tibble(df_hour[,2:25]))
hour_fbagplot2 <- fbplot(fd_day2, main="Magnitude outliers week data",
adjust = list( N_trials = 20,trial_size = fd_day2$N,
VERBOSE = TRUE ))
* * * Iteration 1 / 20
* * * * beginning optimization
Error in stats::uniroot(cost_functional, interval = c(F_min, F_max), tol = tol, :
f() values at end points not of opposite sign
this is not working, don’t know why.
chosing a smalle F value:
hour_fbagplot2 <- fbplot(fd_day, main="Magnitude outliers week data",Fvalue = 0.9)
the chosen F value is:
hour_fbagplot2$Fvalue
[1] 0.9
the outlying years are:
df_hour[hour_fbagplot2$ID_outliers,1]
new yeas eve and 2 strange days : 2005-07-31 and 2009-02-11
outiliergram:
invisible(out_hour<- outliergram(fd_day,adjust = F,lwd = 3,display = F))
the found outliers are:
df_hour[out_hour$ID_outliers,1]
the plot of the original function is not working.
adjusting the F:
out_hour <- outliergram(fd_day,lwd = 3,adjust = list( N_trials = 20,trial_size = 8*fd_day$N,
VERBOSE = TRUE ),display = FALSE)
* * * Iteration 1 / 20
* * * * beginning optimization
* * * Iteration 2 / 20
* * * * beginning optimization
* * * Iteration 3 / 20
* * * * beginning optimization
* * * Iteration 4 / 20
* * * * beginning optimization
* * * Iteration 5 / 20
* * * * beginning optimization
* * * Iteration 6 / 20
* * * * beginning optimization
* * * Iteration 7 / 20
* * * * beginning optimization
* * * Iteration 8 / 20
* * * * beginning optimization
* * * Iteration 9 / 20
* * * * beginning optimization
* * * Iteration 10 / 20
* * * * beginning optimization
* * * Iteration 11 / 20
* * * * beginning optimization
* * * Iteration 12 / 20
* * * * beginning optimization
* * * Iteration 13 / 20
* * * * beginning optimization
* * * Iteration 14 / 20
* * * * beginning optimization
* * * Iteration 15 / 20
* * * * beginning optimization
* * * Iteration 16 / 20
* * * * beginning optimization
* * * Iteration 17 / 20
* * * * beginning optimization
* * * Iteration 18 / 20
* * * * beginning optimization
* * * Iteration 19 / 20
* * * * beginning optimization
* * * Iteration 20 / 20
* * * * beginning optimization
out_hour$Fvalue
[1] 2.055792
nothing changed, same outliers detected.
df_hour[out_hour$ID_outliers,1:2]
plotting in the old way.
par(mfrow=c(1,2))
plot(fd_day[out_hour$ID_outliers,],lwd = 1,main = "outliers",col = 2)
plot(fd_day[-out_hour$ID_outliers,],lwd = 1,main = "non outliers",col = 3)
there are 2 typer of outtliers: very low crashes days, very high crashes day.
lets do some clustering:
hours <- 1:24
n <- fd_day$N
x <- t(matrix(rep(hours,n),24,n))
y <- as.matrix(df_hour[,2:25])
k <- 3
system.time(
fdakma0der <- fdakmeans(x = x,y = y, n_clusters = k,
seeds = sample(1:n,k),
warping_class = "none",
metric = "pearson",
centroid_type = "mean",
distance_relative_tolerance = 1e-3,
add_silhouettes = F,
parallel_method = 0L,
number_of_threads = 12
)
)
Information about the data set:
- Number of observations: 6574
- Number of dimensions: 1
- Number of points: 24
Information about cluster initialization:
- Number of clusters: 3
- Initial seeds for cluster centers: 3405 5667 1171
Information about the methods used within the algorithm:
- Warping method: none
- Center method: mean
- Dissimilarity method: pearson
- Optimization method: bobyqa
Information about warping parameter bounds:
- Warping options: 0.1500 0.1500
Information about convergence criteria:
- Maximum number of iterations: 100
- Distance relative tolerance: 0.001
Information about parallelization setup:
- Number of threads: 12
- Parallel method: 0
Other information:
- Use fence to robustify: 0
- Check total dissimilarity: 1
- Compute overall center: 0
Running k-centroid algorithm:
- Iteration #1
* Size of cluster #0: 4049
* Size of cluster #1: 1239
* Size of cluster #2: 1286
- Iteration #2
* Size of cluster #0: 3456
* Size of cluster #1: 1233
* Size of cluster #2: 1885
- Iteration #3
* Size of cluster #0: 3148
* Size of cluster #1: 1443
* Size of cluster #2: 1983
- Iteration #4
* Size of cluster #0: 2937
* Size of cluster #1: 1589
* Size of cluster #2: 2048
- Iteration #5
* Size of cluster #0: 2767
* Size of cluster #1: 1724
* Size of cluster #2: 2083
- Iteration #6
* Size of cluster #0: 2637
* Size of cluster #1: 1832
* Size of cluster #2: 2105
- Iteration #7
* Size of cluster #0: 2520
* Size of cluster #1: 1935
* Size of cluster #2: 2119
- Iteration #8
* Size of cluster #0: 2436
* Size of cluster #1: 2007
* Size of cluster #2: 2131
- Iteration #9
* Size of cluster #0: 2373
* Size of cluster #1: 2067
* Size of cluster #2: 2134
Active stopping criteria:
- The total dissimilarity did not decrease.
user system elapsed
37.655 0.394 38.089
autoplot(fdakma0der)
matplot(t(fdakma0der$center_curves[,1,]),type = 'l',
main='clustered and alligned curves',xlab='days',ylab='crashes', col = 1:5,lwd = 3)
k <- 3
system.time(
fdakma0der <- fdakmeans(x = x,y = y, n_clusters = k,
seeds = sample(1:n,k),
warping_class = "none",
metric = "l2",
centroid_type = "mean",
distance_relative_tolerance = 1e-3,
add_silhouettes = F,
parallel_method = 0L,
number_of_threads = 12
)
)
Information about the data set:
- Number of observations: 6574
- Number of dimensions: 1
- Number of points: 24
Information about cluster initialization:
- Number of clusters: 3
- Initial seeds for cluster centers: 1957 365 4321
Information about the methods used within the algorithm:
- Warping method: none
- Center method: mean
- Dissimilarity method: l2
- Optimization method: bobyqa
Information about warping parameter bounds:
- Warping options: 0.1500 0.1500
Information about convergence criteria:
- Maximum number of iterations: 100
- Distance relative tolerance: 0.001
Information about parallelization setup:
- Number of threads: 12
- Parallel method: 0
Other information:
- Use fence to robustify: 0
- Check total dissimilarity: 1
- Compute overall center: 0
Running k-centroid algorithm:
- Iteration #1
* Size of cluster #0: 3398
* Size of cluster #1: 2512
* Size of cluster #2: 664
- Iteration #2
* Size of cluster #0: 3510
* Size of cluster #1: 2379
* Size of cluster #2: 685
- Iteration #3
* Size of cluster #0: 3567
* Size of cluster #1: 2358
* Size of cluster #2: 649
Active stopping criteria:
- The total dissimilarity did not decrease.
user system elapsed
15.469 0.213 15.699
autoplot(fdakma0der)
matplot(t(fdakma0der$center_curves[,1,]),type = 'l',
main='clustered and alligned curves',xlab='days',ylab='crashes', col = 1:5,lwd = 3)
pearson is the prefered metric, since it takes into account the shape as well.
we need to chose the k:
n_sub <- 50
sub_id <- sample(1:n,n_sub,replace = FALSE)
x_sub <- x[sub_id,]
y_sub <- y[sub_id,]
system.time(invisible(comparison_kmeans <- compare_caps(
x_sub,
y_sub,
n_clusters = 1:4,
metric = "pearson",
clustering_method = "kmeans",
warping_class = "none",
centroid_type = "mean",
cluster_on_phase = FALSE
)
)
)
plot(comparison_kmeans, validation_criterion = "wss", what = "mean",lwd = 3)
plot(comparison_kmeans, validation_criterion = "wss", what = "distribution")
plot(comparison_kmeans, validation_criterion = "silhouette", what = "mean")
plot(comparison_kmeans, validation_criterion = "silhouette", what = "distribution")
it is not clear, but 2 should be the best
k <- 2
system.time(
fdakma0der <- fdakmeans(x = x,y = y, n_clusters = k,
seeds = sample(1:n,k),
warping_class = "none",
metric = "pearson",
centroid_type = "mean",
distance_relative_tolerance = 1e-3,
add_silhouettes = F,
parallel_method = 0L,
number_of_threads = 20L
)
)
autoplot(fdakma0der)
matplot(t(fdakma0der$center_curves[,1,]),type = 'l',
main='clustered and alligned curves',xlab='days',ylab='crashes', col = 1:5,lwd = 3)
this is a clear distinction between working days and holydays.
trying k = 4
k <- 4
system.time(
fdakma0der <- fdakmeans(x = x,y = y, n_clusters = k,
seeds = sample(1:n,k),
warping_class = "none",
metric = "pearson",
centroid_type = "mean",
distance_relative_tolerance = 1e-3,
add_silhouettes = F,
parallel_method = 0L,
number_of_threads = 20L
)
)
Information about the data set:
- Number of observations: 6574
- Number of dimensions: 1
- Number of points: 24
Information about cluster initialization:
- Number of clusters: 4
- Initial seeds for cluster centers: 4874 3516 2556 4729
Information about the methods used within the algorithm:
- Warping method: none
- Center method: mean
- Dissimilarity method: pearson
- Optimization method: bobyqa
Information about warping parameter bounds:
- Warping options: 0.1500 0.1500
Information about convergence criteria:
- Maximum number of iterations: 100
- Distance relative tolerance: 0.001
Information about parallelization setup:
- Number of threads: 20
- Parallel method: 0
Other information:
- Use fence to robustify: 0
- Check total dissimilarity: 1
- Compute overall center: 0
Running k-centroid algorithm:
- Iteration #1
* Size of cluster #0: 2274
* Size of cluster #1: 750
* Size of cluster #2: 34
* Size of cluster #3: 3516
- Iteration #2
* Size of cluster #0: 1453
* Size of cluster #1: 1277
* Size of cluster #2: 232
* Size of cluster #3: 3612
- Iteration #3
* Size of cluster #0: 1299
* Size of cluster #1: 1397
* Size of cluster #2: 467
* Size of cluster #3: 3411
- Iteration #4
* Size of cluster #0: 1408
* Size of cluster #1: 1402
* Size of cluster #2: 617
* Size of cluster #3: 3147
- Iteration #5
* Size of cluster #0: 1566
* Size of cluster #1: 1362
* Size of cluster #2: 721
* Size of cluster #3: 2925
- Iteration #6
* Size of cluster #0: 1704
* Size of cluster #1: 1324
* Size of cluster #2: 792
* Size of cluster #3: 2754
- Iteration #7
* Size of cluster #0: 1824
* Size of cluster #1: 1283
* Size of cluster #2: 849
* Size of cluster #3: 2618
- Iteration #8
* Size of cluster #0: 1921
* Size of cluster #1: 1270
* Size of cluster #2: 881
* Size of cluster #3: 2502
- Iteration #9
* Size of cluster #0: 1987
* Size of cluster #1: 1269
* Size of cluster #2: 901
* Size of cluster #3: 2417
- Iteration #10
* Size of cluster #0: 2056
* Size of cluster #1: 1257
* Size of cluster #2: 922
* Size of cluster #3: 2339
- Iteration #11
* Size of cluster #0: 2104
* Size of cluster #1: 1255
* Size of cluster #2: 934
* Size of cluster #3: 2281
Active stopping criteria:
- The total dissimilarity did not decrease.
user system elapsed
55.476 0.725 56.717
autoplot(fdakma0der)
matplot(t(fdakma0der$center_curves[,1,]),type = 'l',
main='clustered and alligned curves',xlab='days',ylab='crashes', col = 1:5,lwd = 3)
it is clear that this is just a magnitude thing for the first peaks
trying k = 10
k <- 10
system.time(
fdakma0der <- fdakmeans(x = x,y = y, n_clusters = k,
seeds = sample(1:n,k),
warping_class = "none",
metric = "pearson",
centroid_type = "mean",
distance_relative_tolerance = 1e-3,
add_silhouettes = F,
parallel_method = 0L,
number_of_threads = 20L
)
)
Information about the data set:
- Number of observations: 6574
- Number of dimensions: 1
- Number of points: 24
Information about cluster initialization:
- Number of clusters: 10
- Initial seeds for cluster centers: 3165 5574 5201 4890 680 611 5666 351 1358 2385
Information about the methods used within the algorithm:
- Warping method: none
- Center method: mean
- Dissimilarity method: pearson
- Optimization method: bobyqa
Information about warping parameter bounds:
- Warping options: 0.1500 0.1500
Information about convergence criteria:
- Maximum number of iterations: 100
- Distance relative tolerance: 0.001
Information about parallelization setup:
- Number of threads: 20
- Parallel method: 0
Other information:
- Use fence to robustify: 0
- Check total dissimilarity: 1
- Compute overall center: 0
Running k-centroid algorithm:
- Iteration #1
* Size of cluster #0: 439
* Size of cluster #1: 11
* Size of cluster #2: 278
* Size of cluster #3: 786
* Size of cluster #4: 660
* Size of cluster #5: 722
* Size of cluster #6: 351
* Size of cluster #7: 403
* Size of cluster #8: 974
* Size of cluster #9: 1950
- Iteration #2
* Size of cluster #0: 517
* Size of cluster #1: 85
* Size of cluster #2: 377
* Size of cluster #3: 1066
* Size of cluster #4: 507
* Size of cluster #5: 597
* Size of cluster #6: 783
* Size of cluster #7: 475
* Size of cluster #8: 853
* Size of cluster #9: 1314
- Iteration #3
* Size of cluster #0: 544
* Size of cluster #1: 200
* Size of cluster #2: 452
* Size of cluster #3: 1036
* Size of cluster #4: 436
* Size of cluster #5: 517
* Size of cluster #6: 836
* Size of cluster #7: 491
* Size of cluster #8: 848
* Size of cluster #9: 1214
- Iteration #4
* Size of cluster #0: 554
* Size of cluster #1: 278
* Size of cluster #2: 539
* Size of cluster #3: 1000
* Size of cluster #4: 411
* Size of cluster #5: 476
* Size of cluster #6: 833
* Size of cluster #7: 498
* Size of cluster #8: 853
* Size of cluster #9: 1132
- Iteration #5
* Size of cluster #0: 559
* Size of cluster #1: 332
* Size of cluster #2: 621
* Size of cluster #3: 964
* Size of cluster #4: 410
* Size of cluster #5: 445
* Size of cluster #6: 843
* Size of cluster #7: 495
* Size of cluster #8: 866
* Size of cluster #9: 1039
- Iteration #6
* Size of cluster #0: 572
* Size of cluster #1: 377
* Size of cluster #2: 686
* Size of cluster #3: 933
* Size of cluster #4: 405
* Size of cluster #5: 431
* Size of cluster #6: 843
* Size of cluster #7: 485
* Size of cluster #8: 874
* Size of cluster #9: 968
- Iteration #7
* Size of cluster #0: 585
* Size of cluster #1: 399
* Size of cluster #2: 735
* Size of cluster #3: 912
* Size of cluster #4: 400
* Size of cluster #5: 425
* Size of cluster #6: 859
* Size of cluster #7: 480
* Size of cluster #8: 863
* Size of cluster #9: 916
- Iteration #8
* Size of cluster #0: 598
* Size of cluster #1: 408
* Size of cluster #2: 769
* Size of cluster #3: 904
* Size of cluster #4: 401
* Size of cluster #5: 418
* Size of cluster #6: 877
* Size of cluster #7: 474
* Size of cluster #8: 848
* Size of cluster #9: 877
- Iteration #9
* Size of cluster #0: 608
* Size of cluster #1: 417
* Size of cluster #2: 792
* Size of cluster #3: 896
* Size of cluster #4: 398
* Size of cluster #5: 412
* Size of cluster #6: 883
* Size of cluster #7: 477
* Size of cluster #8: 845
* Size of cluster #9: 846
- Iteration #10
* Size of cluster #0: 625
* Size of cluster #1: 420
* Size of cluster #2: 810
* Size of cluster #3: 892
* Size of cluster #4: 394
* Size of cluster #5: 408
* Size of cluster #6: 884
* Size of cluster #7: 474
* Size of cluster #8: 843
* Size of cluster #9: 824
- Iteration #11
* Size of cluster #0: 627
* Size of cluster #1: 421
* Size of cluster #2: 812
* Size of cluster #3: 888
* Size of cluster #4: 392
* Size of cluster #5: 410
* Size of cluster #6: 887
* Size of cluster #7: 481
* Size of cluster #8: 850
* Size of cluster #9: 806
- Iteration #12
* Size of cluster #0: 626
* Size of cluster #1: 428
* Size of cluster #2: 815
* Size of cluster #3: 879
* Size of cluster #4: 397
* Size of cluster #5: 411
* Size of cluster #6: 888
* Size of cluster #7: 485
* Size of cluster #8: 852
* Size of cluster #9: 793
- Iteration #13
* Size of cluster #0: 633
* Size of cluster #1: 430
* Size of cluster #2: 820
* Size of cluster #3: 866
* Size of cluster #4: 401
* Size of cluster #5: 413
* Size of cluster #6: 883
* Size of cluster #7: 490
* Size of cluster #8: 850
* Size of cluster #9: 788
- Iteration #14
* Size of cluster #0: 643
* Size of cluster #1: 436
* Size of cluster #2: 826
* Size of cluster #3: 860
* Size of cluster #4: 406
* Size of cluster #5: 409
* Size of cluster #6: 872
* Size of cluster #7: 494
* Size of cluster #8: 849
* Size of cluster #9: 779
- Iteration #15
* Size of cluster #0: 648
* Size of cluster #1: 451
* Size of cluster #2: 826
* Size of cluster #3: 856
* Size of cluster #4: 414
* Size of cluster #5: 410
* Size of cluster #6: 852
* Size of cluster #7: 493
* Size of cluster #8: 847
* Size of cluster #9: 777
- Iteration #16
* Size of cluster #0: 654
* Size of cluster #1: 460
* Size of cluster #2: 825
* Size of cluster #3: 855
* Size of cluster #4: 422
* Size of cluster #5: 405
* Size of cluster #6: 848
* Size of cluster #7: 491
* Size of cluster #8: 840
* Size of cluster #9: 774
- Iteration #17
* Size of cluster #0: 658
* Size of cluster #1: 465
* Size of cluster #2: 818
* Size of cluster #3: 850
* Size of cluster #4: 425
* Size of cluster #5: 401
* Size of cluster #6: 841
* Size of cluster #7: 494
* Size of cluster #8: 842
* Size of cluster #9: 780
- Iteration #18
* Size of cluster #0: 663
* Size of cluster #1: 470
* Size of cluster #2: 815
* Size of cluster #3: 846
* Size of cluster #4: 436
* Size of cluster #5: 394
* Size of cluster #6: 833
* Size of cluster #7: 492
* Size of cluster #8: 845
* Size of cluster #9: 780
- Iteration #19
* Size of cluster #0: 669
* Size of cluster #1: 486
* Size of cluster #2: 814
* Size of cluster #3: 841
* Size of cluster #4: 441
* Size of cluster #5: 391
* Size of cluster #6: 818
* Size of cluster #7: 493
* Size of cluster #8: 838
* Size of cluster #9: 783
- Iteration #20
* Size of cluster #0: 680
* Size of cluster #1: 493
* Size of cluster #2: 809
* Size of cluster #3: 837
* Size of cluster #4: 444
* Size of cluster #5: 394
* Size of cluster #6: 802
* Size of cluster #7: 495
* Size of cluster #8: 836
* Size of cluster #9: 784
- Iteration #21
* Size of cluster #0: 687
* Size of cluster #1: 508
* Size of cluster #2: 808
* Size of cluster #3: 831
* Size of cluster #4: 444
* Size of cluster #5: 402
* Size of cluster #6: 780
* Size of cluster #7: 493
* Size of cluster #8: 835
* Size of cluster #9: 786
- Iteration #22
* Size of cluster #0: 700
* Size of cluster #1: 521
* Size of cluster #2: 808
* Size of cluster #3: 829
* Size of cluster #4: 446
* Size of cluster #5: 404
* Size of cluster #6: 767
* Size of cluster #7: 489
* Size of cluster #8: 831
* Size of cluster #9: 779
- Iteration #23
* Size of cluster #0: 705
* Size of cluster #1: 537
* Size of cluster #2: 804
* Size of cluster #3: 825
* Size of cluster #4: 448
* Size of cluster #5: 408
* Size of cluster #6: 749
* Size of cluster #7: 490
* Size of cluster #8: 833
* Size of cluster #9: 775
- Iteration #24
* Size of cluster #0: 707
* Size of cluster #1: 566
* Size of cluster #2: 803
* Size of cluster #3: 820
* Size of cluster #4: 446
* Size of cluster #5: 409
* Size of cluster #6: 725
* Size of cluster #7: 495
* Size of cluster #8: 832
* Size of cluster #9: 771
- Iteration #25
* Size of cluster #0: 712
* Size of cluster #1: 585
* Size of cluster #2: 802
* Size of cluster #3: 815
* Size of cluster #4: 449
* Size of cluster #5: 410
* Size of cluster #6: 708
* Size of cluster #7: 494
* Size of cluster #8: 831
* Size of cluster #9: 768
- Iteration #26
* Size of cluster #0: 712
* Size of cluster #1: 598
* Size of cluster #2: 801
* Size of cluster #3: 813
* Size of cluster #4: 454
* Size of cluster #5: 408
* Size of cluster #6: 706
* Size of cluster #7: 496
* Size of cluster #8: 825
* Size of cluster #9: 761
- Iteration #27
* Size of cluster #0: 712
* Size of cluster #1: 613
* Size of cluster #2: 804
* Size of cluster #3: 812
* Size of cluster #4: 453
* Size of cluster #5: 405
* Size of cluster #6: 698
* Size of cluster #7: 505
* Size of cluster #8: 816
* Size of cluster #9: 756
- Iteration #28
* Size of cluster #0: 712
* Size of cluster #1: 622
* Size of cluster #2: 800
* Size of cluster #3: 804
* Size of cluster #4: 459
* Size of cluster #5: 399
* Size of cluster #6: 693
* Size of cluster #7: 509
* Size of cluster #8: 816
* Size of cluster #9: 760
- Iteration #29
* Size of cluster #0: 710
* Size of cluster #1: 626
* Size of cluster #2: 797
* Size of cluster #3: 796
* Size of cluster #4: 464
* Size of cluster #5: 391
* Size of cluster #6: 696
* Size of cluster #7: 515
* Size of cluster #8: 818
* Size of cluster #9: 761
- Iteration #30
* Size of cluster #0: 707
* Size of cluster #1: 628
* Size of cluster #2: 796
* Size of cluster #3: 790
* Size of cluster #4: 467
* Size of cluster #5: 382
* Size of cluster #6: 696
* Size of cluster #7: 526
* Size of cluster #8: 821
* Size of cluster #9: 761
- Iteration #31
* Size of cluster #0: 709
* Size of cluster #1: 629
* Size of cluster #2: 794
* Size of cluster #3: 787
* Size of cluster #4: 461
* Size of cluster #5: 382
* Size of cluster #6: 697
* Size of cluster #7: 531
* Size of cluster #8: 823
* Size of cluster #9: 761
- Iteration #32
* Size of cluster #0: 711
* Size of cluster #1: 635
* Size of cluster #2: 794
* Size of cluster #3: 783
* Size of cluster #4: 466
* Size of cluster #5: 376
* Size of cluster #6: 693
* Size of cluster #7: 532
* Size of cluster #8: 824
* Size of cluster #9: 760
- Iteration #33
* Size of cluster #0: 710
* Size of cluster #1: 638
* Size of cluster #2: 795
* Size of cluster #3: 782
* Size of cluster #4: 473
* Size of cluster #5: 373
* Size of cluster #6: 690
* Size of cluster #7: 531
* Size of cluster #8: 821
* Size of cluster #9: 761
- Iteration #34
* Size of cluster #0: 715
* Size of cluster #1: 635
* Size of cluster #2: 799
* Size of cluster #3: 781
* Size of cluster #4: 476
* Size of cluster #5: 366
* Size of cluster #6: 693
* Size of cluster #7: 533
* Size of cluster #8: 819
* Size of cluster #9: 757
- Iteration #35
* Size of cluster #0: 710
* Size of cluster #1: 634
* Size of cluster #2: 800
* Size of cluster #3: 780
* Size of cluster #4: 483
* Size of cluster #5: 365
* Size of cluster #6: 694
* Size of cluster #7: 538
* Size of cluster #8: 815
* Size of cluster #9: 755
- Iteration #36
* Size of cluster #0: 710
* Size of cluster #1: 639
* Size of cluster #2: 800
* Size of cluster #3: 778
* Size of cluster #4: 482
* Size of cluster #5: 359
* Size of cluster #6: 692
* Size of cluster #7: 544
* Size of cluster #8: 814
* Size of cluster #9: 756
- Iteration #37
* Size of cluster #0: 711
* Size of cluster #1: 641
* Size of cluster #2: 802
* Size of cluster #3: 777
* Size of cluster #4: 491
* Size of cluster #5: 351
* Size of cluster #6: 689
* Size of cluster #7: 544
* Size of cluster #8: 813
* Size of cluster #9: 755
- Iteration #38
* Size of cluster #0: 717
* Size of cluster #1: 642
* Size of cluster #2: 800
* Size of cluster #3: 776
* Size of cluster #4: 498
* Size of cluster #5: 338
* Size of cluster #6: 694
* Size of cluster #7: 546
* Size of cluster #8: 811
* Size of cluster #9: 752
- Iteration #39
* Size of cluster #0: 718
* Size of cluster #1: 643
* Size of cluster #2: 799
* Size of cluster #3: 776
* Size of cluster #4: 502
* Size of cluster #5: 329
* Size of cluster #6: 697
* Size of cluster #7: 553
* Size of cluster #8: 806
* Size of cluster #9: 751
- Iteration #40
* Size of cluster #0: 720
* Size of cluster #1: 645
* Size of cluster #2: 799
* Size of cluster #3: 776
* Size of cluster #4: 510
* Size of cluster #5: 322
* Size of cluster #6: 696
* Size of cluster #7: 554
* Size of cluster #8: 803
* Size of cluster #9: 749
- Iteration #41
* Size of cluster #0: 718
* Size of cluster #1: 652
* Size of cluster #2: 798
* Size of cluster #3: 776
* Size of cluster #4: 517
* Size of cluster #5: 317
* Size of cluster #6: 690
* Size of cluster #7: 556
* Size of cluster #8: 802
* Size of cluster #9: 748
- Iteration #42
* Size of cluster #0: 719
* Size of cluster #1: 657
* Size of cluster #2: 797
* Size of cluster #3: 775
* Size of cluster #4: 519
* Size of cluster #5: 315
* Size of cluster #6: 687
* Size of cluster #7: 556
* Size of cluster #8: 802
* Size of cluster #9: 747
- Iteration #43
* Size of cluster #0: 718
* Size of cluster #1: 659
* Size of cluster #2: 796
* Size of cluster #3: 773
* Size of cluster #4: 521
* Size of cluster #5: 316
* Size of cluster #6: 687
* Size of cluster #7: 555
* Size of cluster #8: 803
* Size of cluster #9: 746
- Iteration #44
* Size of cluster #0: 719
* Size of cluster #1: 661
* Size of cluster #2: 797
* Size of cluster #3: 772
* Size of cluster #4: 524
* Size of cluster #5: 314
* Size of cluster #6: 685
* Size of cluster #7: 553
* Size of cluster #8: 803
* Size of cluster #9: 746
Active stopping criteria:
- The total dissimilarity did not decrease.
user system elapsed
407.936 2.718 410.909
autoplot(fdakma0der)
matplot(t(fdakma0der$center_curves[,1,]),type = 'l',
main='clustered and alligned curves',xlab='days',ylab='crashes', col = 1:k,lwd = 3)
not much more information here, stick to 2
k <- 10
system.time(
fdakma0der <- fdakmeans(x = x,y = y, n_clusters = k,
seeds = sample(1:n,k),
warping_class = "none",
metric = "l2",
centroid_type = "mean",
distance_relative_tolerance = 1e-3,
add_silhouettes = F,
parallel_method = 0L,
number_of_threads = 20L
)
)
Information about the data set:
- Number of observations: 6574
- Number of dimensions: 1
- Number of points: 24
Information about cluster initialization:
- Number of clusters: 10
- Initial seeds for cluster centers: 1888 2794 294 5269 4206 2609 2990 4427 3292 1192
Information about the methods used within the algorithm:
- Warping method: none
- Center method: mean
- Dissimilarity method: l2
- Optimization method: bobyqa
Information about warping parameter bounds:
- Warping options: 0.1500 0.1500
Information about convergence criteria:
- Maximum number of iterations: 100
- Distance relative tolerance: 0.001
Information about parallelization setup:
- Number of threads: 20
- Parallel method: 0
Other information:
- Use fence to robustify: 0
- Check total dissimilarity: 1
- Compute overall center: 0
Running k-centroid algorithm:
- Iteration #1
* Size of cluster #0: 179
* Size of cluster #1: 760
* Size of cluster #2: 284
* Size of cluster #3: 460
* Size of cluster #4: 316
* Size of cluster #5: 2314
* Size of cluster #6: 926
* Size of cluster #7: 704
* Size of cluster #8: 371
* Size of cluster #9: 260
- Iteration #2
* Size of cluster #0: 336
* Size of cluster #1: 848
* Size of cluster #2: 629
* Size of cluster #3: 419
* Size of cluster #4: 402
* Size of cluster #5: 1854
* Size of cluster #6: 784
* Size of cluster #7: 579
* Size of cluster #8: 343
* Size of cluster #9: 380
- Iteration #3
* Size of cluster #0: 474
* Size of cluster #1: 793
* Size of cluster #2: 865
* Size of cluster #3: 343
* Size of cluster #4: 393
* Size of cluster #5: 1615
* Size of cluster #6: 656
* Size of cluster #7: 552
* Size of cluster #8: 268
* Size of cluster #9: 615
- Iteration #4
* Size of cluster #0: 558
* Size of cluster #1: 773
* Size of cluster #2: 1030
* Size of cluster #3: 314
* Size of cluster #4: 391
* Size of cluster #5: 1441
* Size of cluster #6: 610
* Size of cluster #7: 521
* Size of cluster #8: 192
* Size of cluster #9: 744
- Iteration #5
* Size of cluster #0: 619
* Size of cluster #1: 776
* Size of cluster #2: 1122
* Size of cluster #3: 289
* Size of cluster #4: 416
* Size of cluster #5: 1353
* Size of cluster #6: 547
* Size of cluster #7: 483
* Size of cluster #8: 143
* Size of cluster #9: 826
- Iteration #6
* Size of cluster #0: 666
* Size of cluster #1: 798
* Size of cluster #2: 1192
* Size of cluster #3: 261
* Size of cluster #4: 463
* Size of cluster #5: 1273
* Size of cluster #6: 464
* Size of cluster #7: 462
* Size of cluster #8: 112
* Size of cluster #9: 883
- Iteration #7
* Size of cluster #0: 692
* Size of cluster #1: 830
* Size of cluster #2: 1244
* Size of cluster #3: 217
* Size of cluster #4: 521
* Size of cluster #5: 1228
* Size of cluster #6: 410
* Size of cluster #7: 454
* Size of cluster #8: 85
* Size of cluster #9: 893
- Iteration #8
* Size of cluster #0: 730
* Size of cluster #1: 856
* Size of cluster #2: 1280
* Size of cluster #3: 203
* Size of cluster #4: 548
* Size of cluster #5: 1206
* Size of cluster #6: 347
* Size of cluster #7: 430
* Size of cluster #8: 74
* Size of cluster #9: 900
Active stopping criteria:
- The total dissimilarity did not decrease.
user system elapsed
121.775 1.155 122.930
autoplot(fdakma0der)
matplot(t(fdakma0der$center_curves[,1,]),type = 'l',
main='clustered and alligned curves',xlab='days',ylab='crashes', col = 1:k,lwd = 3)
not much more information here, stick to 2